AITopics | zero-shot translation

Collaborating Authors

zero-shot translation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Domain-Specific Translation with Open-Source Large Language Models: Resource-Oriented Analysis

Wassie, Aman Kassahun, Molaei, Mahdi, Moslem, Yasmin

arXiv.org Artificial IntelligenceDec-8-2024

In this work, we compare the domain-specific translation performance of open-source autoregressive decoder-only large language models (LLMs) with task-oriented machine translation (MT) models. Our experiments focus on the medical domain and cover four language pairs with varied resource availability: English-to-French, English-to-Portuguese, English-to-Swahili, and Swahili-to-English. Despite recent advancements, LLMs exhibit a clear gap in specialized translation quality compared to multilingual encoder-decoder MT models such as NLLB-200. In three out of four language directions in our study, NLLB-200 3.3B outperforms all LLMs in the size range of 8B parameters in medical translation. While fine-tuning LLMs such as Mistral and Llama improves their performance at medical translation, these models still fall short compared to fine-tuned NLLB-200 3.3B models. Our findings highlight the ongoing need for specialized MT models to achieve higher-quality domain-specific translation, especially in medium-resource and low-resource settings. As larger LLMs outperform their 8B variants, this also encourages pre-training domain-specific medium-sized LMs to improve quality and efficiency in specialized translation tasks.

dataset, nllb-200 3, translation, (15 more...)

arXiv.org Artificial Intelligence

2412.05862

Country:

Asia > Singapore (0.04)
North America > Canada > Ontario > Toronto (0.04)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
(10 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (0.46)
Leisure & Entertainment (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.74)

Add feedback

Improving Language Transfer Capability of Decoder-only Architecture in Multilingual Neural Machine Translation

Qu, Zhi, Wang, Yiran, Ding, Chenchen, Tanaka, Hideki, Utiyama, Masao, Watanabe, Taro

arXiv.org Artificial IntelligenceDec-2-2024

Existing multilingual neural machine translation (MNMT) approaches mainly focus on improving models with the encoder-decoder architecture to translate multiple languages. However, decoder-only architecture has been explored less in MNMT due to its underperformance when trained on parallel data solely. In this work, we attribute the issue of the decoder-only architecture to its lack of language transfer capability. Specifically, the decoder-only architecture is insufficient in encoding source tokens with the target language features. We propose dividing the decoding process into two stages so that target tokens are explicitly excluded in the first stage to implicitly boost the transfer capability across languages. Additionally, we impose contrastive learning on translation instructions, resulting in improved performance in zero-shot translation. We conduct experiments on TED-19 and OPUS-100 datasets, considering both training from scratch and fine-tuning scenarios. Experimental results show that, compared to the encoder-decoder architecture, our methods not only perform competitively in supervised translations but also achieve improvements of up to 3.39 BLEU, 6.99 chrF++, 3.22 BERTScore, and 4.81 COMET in zero-shot translations.

artificial intelligence, machine translation, natural language, (18 more...)

arXiv.org Artificial Intelligence

2412.02101

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Africa > Niger (0.04)
Asia > Japan (0.04)
(11 more...)

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

On the Shortcut Learning in Multilingual Neural Machine Translation

Wang, Wenxuan, Jiao, Wenxiang, Huang, Jen-tse, Tu, Zhaopeng, Lyu, Michael R.

arXiv.org Artificial IntelligenceNov-15-2024

In this study, we revisit the commonly-cited off-target issue in multilingual neural machine translation (MNMT). By carefully designing experiments on different MNMT scenarios and models, we attribute the off-target issue to the overfitting of the shortcuts of (non-centric, centric) language mappings. Specifically, the learned shortcuts biases MNMT to mistakenly translate non-centric languages into the centric language instead of the expected non-centric language for zero-shot translation. Analyses on learning dynamics show that the shortcut learning generally occurs in the later stage of model training, and multilingual pretraining accelerates and aggravates the shortcut learning. Based on these observations, we propose a simple and effective training strategy to eliminate the shortcuts in MNMT models by leveraging the forgetting nature of model training. The only difference from the standard training is that we remove the training instances that may induce the shortcut learning in the later stage of model training. Without introducing any additional data and computational costs, our approach can consistently and significantly improve the zero-shot translation performance by alleviating the shortcut learning for different MNMT models and benchmarks.

machine learning, natural language, translation, (18 more...)

arXiv.org Artificial Intelligence

2411.10581

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Asia > China > Hong Kong (0.05)
Asia > China > Guangdong Province > Shenzhen (0.04)
(3 more...)

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Understanding and Mitigating the Uncertainty in Zero-Shot Translation

Wang, Wenxuan, Jiao, Wenxiang, Wang, Shuo, Tu, Zhaopeng, Lyu, Michael R.

arXiv.org Artificial IntelligenceOct-18-2024

Zero-shot translation is a promising direction for building a comprehensive multilingual neural machine translation~(MNMT) system. However, its quality is still not satisfactory due to off-target issues. In this paper, we aim to understand and alleviate the off-target issues from the perspective of uncertainty in zero-shot translation. By carefully examining the translation output and model confidence, we identify two uncertainties that are responsible for the off-target issues, namely, extrinsic data uncertainty and intrinsic model uncertainty. Based on the observations, we propose two lightweight and complementary approaches to denoise the training data for model training and explicitly penalize the off-target translations by unlikelihood training during model training. Extensive experiments on both balanced and imbalanced datasets show that our approaches significantly improve the performance of zero-shot translation over strong MNMT baselines.

large language model, natural language, translation, (20 more...)

arXiv.org Artificial Intelligence

2205.10068

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Asia > China > Hong Kong (0.05)
Asia > China > Guangdong Province > Shenzhen (0.04)
(3 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Translating Step-by-Step: Decomposing the Translation Process for Improved Translation Quality of Long-Form Texts

Briakou, Eleftheria, Luo, Jiaming, Cherry, Colin, Freitag, Markus

arXiv.org Artificial IntelligenceSep-10-2024

In this paper we present a step-by-step approach to long-form text translation, drawing on established processes in translation studies. Instead of viewing machine translation as a single, monolithic task, we propose a framework that engages language models in a multi-turn interaction, encompassing pre-translation research, drafting, refining, and proofreading, resulting in progressively improved translations. Extensive automatic evaluations using Gemini 1.5 Pro across ten language pairs show that translating step-by-step yields large translation quality improvements over conventional zeroshot prompting approaches and earlier humanlike baseline strategies, resulting in state-of-theart Figure 1: MetricX-23 quality improvements (where results on

source text, translation, translation process, (15 more...)

arXiv.org Artificial Intelligence

2409.0679

Country:

Asia > Singapore (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(6 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Language-Informed Beam Search Decoding for Multilingual Machine Translation

Yang, Yilin, Lee, Stefan, Tadepalli, Prasad

arXiv.org Artificial IntelligenceAug-11-2024

Beam search decoding is the de-facto method for decoding auto-regressive Neural Machine Translation (NMT) models, including multilingual NMT where the target language is specified as an input. However, decoding multilingual NMT models commonly produces ``off-target'' translations -- yielding translation outputs not in the intended language. In this paper, we first conduct an error analysis of off-target translations for a strong multilingual NMT model and identify how these decodings are produced during beam search. We then propose Language-informed Beam Search (LiBS), a general decoding algorithm incorporating an off-the-shelf Language Identification (LiD) model into beam search decoding to reduce off-target translations. LiBS is an inference-time procedure that is NMT-model agnostic and does not require any additional parallel data. Results show that our proposed LiBS algorithm on average improves +1.1 BLEU and +0.9 BLEU on WMT and OPUS datasets, and reduces off-target rates from 22.9\% to 7.7\% and 65.8\% to 25.3\% respectively.

algorithm, arxiv preprint arxiv, translation, (12 more...)

arXiv.org Artificial Intelligence

2408.05738

Country:

Europe > France (0.04)
North America > United States > Oregon (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Improving Multilingual Neural Machine Translation by Utilizing Semantic and Linguistic Features

Bu, Mengyu, Gu, Shuhao, Feng, Yang

arXiv.org Artificial IntelligenceAug-2-2024

The many-to-many multilingual neural machine translation can be regarded as the process of integrating semantic features from the source sentences and linguistic features from the target sentences. To enhance zero-shot translation, models need to share knowledge across languages, which can be achieved through auxiliary tasks for learning a universal representation or cross-lingual mapping. To this end, we propose to exploit both semantic and linguistic features between multiple languages to enhance multilingual translation. On the encoder side, we introduce a disentangling learning task that aligns encoder representations by disentangling semantic and linguistic features, thus facilitating knowledge transfer while preserving complete information. On the decoder side, we leverage a linguistic encoder to integrate low-level linguistic features to assist in the target language generation. Experimental results on multilingual datasets demonstrate significant improvement in zero-shot translation compared to the baseline system, while maintaining performance in supervised translation. Further analysis validates the effectiveness of our method in leveraging both semantic and linguistic features. The code is available at https://github.com/ictnlp/SemLing-MNMT.

linguistic feature, representation, translation, (14 more...)

arXiv.org Artificial Intelligence

2408.01394

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Dominican Republic (0.04)
Europe > Italy > Tuscany > Florence (0.04)
(9 more...)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Languages Transferred Within the Encoder: On Representation Transfer in Zero-Shot Multilingual Translation

Qu, Zhi, Ding, Chenchen, Watanabe, Taro

arXiv.org Artificial IntelligenceJun-12-2024

Understanding representation transfer in multilingual neural machine translation can reveal the representational issue causing the zero-shot translation deficiency. In this work, we introduce the identity pair, a sentence translated into itself, to address the lack of the base measure in multilingual investigations, as the identity pair represents the optimal state of representation among any language transfers. In our analysis, we demonstrate that the encoder transfers the source language to the representational subspace of the target language instead of the language-agnostic state. Thus, the zero-shot translation deficiency arises because representations are entangled with other languages and are not transferred effectively to the target language. Based on our findings, we propose two methods: 1) low-rank language-specific embedding at the encoder, and 2) language-specific contrastive learning of the representation at the decoder. The experimental results on Europarl-15, TED-19, and OPUS-100 datasets show that our methods substantially enhance the performance of zero-shot translations by improving language transfer capacity, thereby providing practical evidence to support our conclusions.

encoder, representation, translation, (15 more...)

arXiv.org Artificial Intelligence

2406.08092

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > Ontario > Toronto (0.04)
Europe > Italy > Tuscany > Florence (0.04)
(8 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

LCS: A Language Converter Strategy for Zero-Shot Neural Machine Translation

Sun, Zengkui, Liu, Yijin, Meng, Fandong, Xu, Jinan, Chen, Yufeng, Zhou, Jie

arXiv.org Artificial IntelligenceJun-5-2024

Multilingual neural machine translation models generally distinguish translation directions by the language tag (LT) in front of the source or target sentences. However, current LT strategies cannot indicate the desired target language as expected on zero-shot translation, i.e., the off-target issue. Our analysis reveals that the indication of the target language is sensitive to the placement of the target LT. For example, when placing the target LT on the decoder side, the indication would rapidly degrade along with decoding steps, while placing the target LT on the encoder side would lead to copying or paraphrasing the source input. To address the above issues, we propose a simple yet effective strategy named Language Converter Strategy (LCS). By introducing the target language embedding into the top encoder layers, LCS mitigates confusion in the encoder and ensures stable language indication for the decoder. Experimental results on MultiUN, TED, and OPUS-100 datasets demonstrate that LCS could significantly mitigate the off-target issue, with language accuracy up to 95.28%, 96.21%, and 85.35% meanwhile outperforming the vanilla LT strategy by 3.07, 3,3, and 7.93 BLEU scores on zero-shot translation, respectively.

computational linguistic, translation, zero-shot translation, (16 more...)

arXiv.org Artificial Intelligence

2406.02876

Country:

North America > Dominican Republic (0.04)
Asia > China > Beijing > Beijing (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
(7 more...)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Building Accurate Translation-Tailored LLMs with Language Aware Instruction Tuning

Zan, Changtong, Ding, Liang, Shen, Li, Zhen, Yibing, Liu, Weifeng, Tao, Dacheng

arXiv.org Artificial IntelligenceMar-21-2024

Translation-tailored Large language models (LLMs) exhibit remarkable translation capabilities, even competing with supervised-trained commercial translation systems. However, off-target translation remains an unsolved problem, especially for low-resource languages, hindering us from developing accurate LLMs-based translation models. To mitigate the off-target translation problem and enhance the performance of LLMs on translation, recent works have either designed advanced prompting strategies to highlight the functionality of translation instructions or exploited the in-context learning ability of LLMs by feeding few-shot demonstrations. However, these methods essentially do not improve LLM's ability to follow translation instructions, especially the language direction information. In this work, we design a two-stage fine-tuning algorithm to improve the instruction-following ability (especially the translation direction) of LLMs. Specifically, we first tune LLMs with the maximum likelihood estimation loss on the translation dataset to elicit the basic translation capabilities. In the second stage, we construct instruction-conflicting samples by randomly replacing the translation directions with a wrong one within the instruction, and then introduce an extra unlikelihood loss to learn those samples. Experiments on IWSLT and WMT benchmarks upon the LLaMA model spanning 16 zero-shot directions show that, compared to the competitive baseline -- translation-finetuned LLama, our method could effectively reduce the off-target translation ratio (averagely -53.3\%), thus improving translation quality with average +5.7 SacreBLEU and +16.4 BLEURT. Analysis shows that our method could preserve the model's general task performance on AlpacaEval. Code and models will be released at \url{https://github.com/alphadl/LanguageAware_Tuning}.

arxiv preprint, instruction-conflicting sample, translation, (15 more...)

arXiv.org Artificial Intelligence

2403.14399

Country: Asia > China (0.05)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.54)

Add feedback